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Abstract 



CO Radial Basis Functions Neural Networks (RBFNNs) are tools widely used in regression problems. One of their principal drawbacks 
}s that the formulation corresponding to the training with the supervision of both the centers and the weights is a highly non-convex 
optimization problem, which leads to some fundamentally difficulties for traditional optimization theory and methods. This paper 
presents a generalized canonical duality theory for solving this challenging problem. We demonstrate that by sequential canonical 
dual transformations, the nonconvex optimization problem of the RBFNN can be reformulated as a canonical dual problem (without 
I [duality gap). Both global optimal solution and local extrema can be classified. Several applications to one of the most used Radial 
Basis Functions, the Gaussian function, are illustrated. Our results show that even for one-dimensional case, the global minimizer 
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of the nonconvex problem may not be the best solution to the RBFNNs, and the canonical dual theory is a promising tool for solving 
general neural networks training problems. 
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Radial Basis Function Neural Networks(RBFNN) are a tool 
introduced in the field of function interpolation [[Ij] and then 
were adapted to the problem of regression f^. During the last 
two decades RBFNN were applied in several fields. The prob- 
lem of regression consists in trying to approximate a function 
/ : M" — > M by means of an approximation function ^(0 that 
uses a set of samples defined as: 



(1) 



■where {x'',y'') are respectively arguments and values of the 
given function f{x). In general the approximating function g{-) 
obtained by the RBFNNs with radial basis function (f)(-) has the 
following form: 



g{x) = ^ w,-0(||x - cll 



(2) 



where is the number of units used to approximate the func- 
tion, or neurons of the network, w is the vector with compo- 
nents w; for / = 1 , . . . , that is the vector of the weights asso- 
ciated with the connections between the units x and c,- e R" for 
J = 1 , . . . , ai-e the centers of the RBFNNs. 

Generally speaking, there are two main optimization strate- 
gies to train a RBFNN. The first consists in the optimization 
of only the weights of the neural network. In this case the cen- 
ters are generally chosen by using clustering strategies [3 ] . This 
problem is a convex problem in the variable w and has the form: 



^ P N 
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£(w) = - 2^ }Jwi<t>(cd - ypY + -Av 



(3) 
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where j6„, is the regularization parameter for the weights. 

The second strategy is to consider both weighter w and the 
centers c of the radial basis functions as variables. This strat- 
egy can be performed by solving the following unconstrained 
optimization problem: 

P N 



£(w,c) = \YjYp'''f'^''-^-y'^^^ 

kiiwii^4/^|;J 



4- 



(4) 
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This problem is non-convex, but from empirical experiments 
it emerged that it generally yields neural networks with an 
higher precision than the ones trained with strategy (l3)- One of 
the most used strategies to solve this optimization problem is to 
apply decomposition algorithms f?]. However, due to the non- 
convexity of the problem (|4]i, there are some fundamental diffi- 
culties to find the global minimum of the problem and to char- 
acterize local minima. Indeed, the problem (|4|i is considered to 
be NP-hard even if the radial basis function ^(c) is a quadratic 
function and n - \ . Another issue that characterizes this 
problem is the choice of the regularization parameters fiw and 
p. In general a cross-validation strategy is applied in order to 
find these regularization parameters. Cross-validation consists 
in trying different values of the parameters in order to find the 
one that yields the neural network with the best prediction. Un- 
til now it was not possible to find a closed form for the optimal 
values of these parameters in the general case. If it is possible 
to find at least an upper bound for these parameters, the time 
needed to perform a cross validation would greatly decrease. 
Canonical duality theory developed from nonconvex analysis 
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and global optimization \3t,$] is a potentially powerful method- 
ology, which has been used successfully for solving a large 
class of challenging problems in biology, engineering, sciences 
ifloifTi , 151. and recently in network communications 111, 131. 
In this paper we study the canonical duality theory for solving 
the general Radial Basis Neural Networks optimization prob- 
lem (IDl and mainly analyze one-dimensional case in order to 
find properties and intuitions that can be useful for the multidi- 
mensional cases. The rest of this paper is arranged as follows. 
In Section 2, we first demonstrate how to rewrite the nonconvex 
primal problem as a dual problem by using sequential canonical 
dual transformation developed in [8, 12]. In Section 3 we prove 
the complementarity-dual principle showing that the obtained 
formulation is canonically dual to the original problem in the 
sense that there is no duality gap. In Section 4, we analyze the 
problem with the Gaussian function as radial basis in the neu- 
rons and show some examples. The last section presents some 
conclusions. 



2. Primal problem for general Radial Basis Func- 
tions(RBF) 

The general one dimensional non-convex function to be ad- 
dressed in this paper can be proposed in the following form: 



P{c) = W(c) + ^I3c^ - fc, 



(5) 



where /3 is the regularization coefficient and / is a positive 
scalar close to zero. The term - fc is not comprised in the orig- 
inal Radial Basis Neural Networks formulation but we consider 
it for the general mathematical case. The non-convex function 
W{c) depends on the choice of the radial basis function (/>{■): 



W(c) 



^ (w0(||jc ■ 



(6) 



where x, y and w belong to M. In applications the parameter 
w is also a variable, but the original problem (2) is convex in 
w while non-convex in respect to the center of the radial basis 
function c. Therefore, the one-dimensional non-convex primal 
problem can be formulated as 



CP): min{P(c)= ^ {wcf,(\\x - cf) - yf 
+ \l3c^ - fc IVcei 



(7) 



In order to apply the canonical duality theory to solve this 
problem, we need to choose the following geometrically non- 
linear operator: 



^ = A(c) = w0(||x-c|p) 



(8) 



Clearly, this is a nonlinear map from M to a subspace £a G 
R, which depends on the choice of the Radial Basis Function 
<p(-). The canonical function associated with this geometrical 
operator is 



# = W(A(c)). 



(9) 



By the definition introduced in the canonical duality theory J^, 
y : £a — > K is said to be canonical function on £a if for any 
given ^ e £a, the duality relation 



a^V\^)^{^-y]:&,^S, 



(10) 



is invertible, where S-^ is the range of the duality mapping 
cr - dV{^), which depends on the choice of the Radial Basis 
Function (pi-). The couple cr) forms a canonical duality pair 
on £a X Ss, with the Legendre conjugate y*(cr) defined by 



V\(t) = {^a - V{^)\(T = V'm ^\\(T^+ ycr 



(11) 



By considering that W(c) - A(c)cr - V*(cr), the primal function 
P(c) can be reformulated as the so-called total complementarity 
function defined by 

E(c, cr) = A(w, c)cr - V'icr) + iySc^ - fc 

1 



= w(/>i\\x - c\\')o- - \^-o-^ + o-y 



(12) 



The function (p(-) can be a non convex function just like W{c). 
For this reason we have to perform a sequential canonical dual 
transformation for the nonlinear operator A(c). To this aim we 
choose a second nonlinear operator: 



e = A2(c) = ||x-c|| 



(13) 



which is a map from R to fib - ^ I^k ^ 0). In terms of e, the 
first level operator ^ = A(c) can be written as 



^ = U{e) = w<t>{e). 



(14) 



We assume that U(e) is a convex function on fib such that the 
second-level duality relation 



T = U'{e) = w4)'{e) 



is invertible, i.e.. 



(15) 



(16) 



where the term {(p' is the inverse of the function <^'(e). 

Thus, the Legendre conjugate of U can be obtained uniquely 
by 

We notice that ^ - w(p{e). By substituting the value of e given 
by ( fTSI l we find a relation that connects the first level primal 
variable ^ with the second level dual variable t: 



By plugging this in ( fTOl i we obtain 



cr — wd 



-y- 



(18) 



(19) 



Generally speaking, it is possible, for certain functions 0, to use 
the canonical dual transformation to find the relation between 
the first level dual variable cr and the second level dual variable 
r by means of the derivatives of 0(-) and the first primal variable 
^. In general this relation is: 



(20) 



Therefore, replacing U(^) - A(c) by its Legendre conjugate 
U*, the total complementarity function becomes 



E(c,Cr,T) = {WXp-CilfT-WiT))^ 

-V\a) + - fc. 



(21) 



It is also possible to rewrite the total complementary function 
(I2TI1 in the following form; 
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E(c, cr, r) = ■^c^i2Tcr + j3) - cilrcrx + f) 

-U\T)cr-V''{cr) + x^TO-. (22) 

By the criticahty condition (9H(c, cr, T)/dc = we obtain 

2txo- + f ^^^^ 

c(T,cr)^- (23) 

2to- + p 

Clearly, if Itct + p + the general solution of (l23T l is 

^^273^+/ v(cr,T)e>Sa = {fr,T|2T(r+^^0) (24) 
2to- + p 

and the canonical dual function of P(c) can be presented as 
1 (2txo- + ff 



2 2to- + p 



- U*(t)ct - y*(cr) + x^TO-. (25) 



By considering dual relation given in (l20l i. and by setting 
■s(o") = we can write the total complementarity function 
in terms of only c and cr 



E(c, cr) = ^c^G(o-) - cFicr) - U*(a-)a- ■ 



y*(cr) + x^w(p' [(p-^ (s(o-)))o-. 



(26) 



where 



G(cr) = 2w(f)' [(p-^ (s(o-))) cr + p, 
Ficr) = 2wcf>' is(cr))) xcr + f, 
U'{(t) = w<^'(^-'(^(cr)))0-'(^((^))-(cr + y). 

Therefore, in terms of cr only, the canonical dual function can 
be written as 

1 F(cr) 



2 G(o-) 
x^wcj)' [(j)-^ {s{(t)))o-. 



(27) 



3. Complementary-Dual Principle 
Theorem 3.1. Ifd- is a critical point of(P'') and the term: 
G'id-) = crcp" [cp-' {s{&)))[r\s{d-))) + 

w0'(0-'(^(cr)))^O, (28) 

then the point 

is a critical point ofP(c) and P(c) — P'^icr) 

Proof 3.1. Suppose that o- is a critical point of P'' then we have 

P''i&)' = [c^ - 2xc + x^- 4>-^ (s(o-))] G'(o-) - 

cr (^-' (s(&))) {4>-' (s(cr)))' - l] = 0. (30) 

Notice that 

(^-' (s(d-)))' = = /. (31) 

^ ^ 4>' (e) 0' (0 ' (s(cr))) 

The third term in (l30t is zero. The term G'(o-) is not zero from 
the hypothesis, so we obtain 



{x-cf-(f>-^(s(a-)) = 0, 



(32) 



that is 



cr^w<f>{\\x-c\f-)-y. (33) 
The critical point condition for the primal problem P'(c) = is 

-2w(x - c)<^'(||x - c||2)(w<^(||x - c||2) - 3,) - / = 0. (34) 

By considering that (f>'(\\x - c|p) - (f>' {cfi^^ (^(o"))) and cr — 
w<p ((x - c)^) - y we obtain 

2w(x - c)4>' (s{o-))) cr+pc-f^O, (35) 

that is 

_ 2<p'{r'(s(cr)))cr + f 

2<p'{rUs(cr)))cr+p- ^^^^ 

By setting cr = 0- in ( l36b we obtain ( l24l i proving that c is a 
critical point of P(c). 

For the correspondence of the function values we start from the 
dual function 

pd(a-) = --TTTTT - U*(d-)d- - y*(o-) + 



2 G(o-) 



{sm) 



cr 



(37) 



add and subtract the term i -^^7^ and substitute the value of c 

2 G{(t) 

\c^G{&) - cF{&) - U'ia)& - y*(CT-)+ 



X^W0' (0 ' (^(O"))) cr 

by reordering the terms we obtain 

- {\\x - cfwcf>' {<p-' (s(d-))) - U'(&)) & 



(38) 



-V'(cr)+-pc^-fc, 



(39) 



Considering the ( fTOl i. setting e = ||jic - c|p and 4>' {4> ' (■^(c'"))) = 
(fi'ie) we obtain: 

\w<p'{e)e - w4>' (e) e + w<^(e)] [w0(e) - y] - 

^ {wm - yf + yiwm -y) + ^j3c^ - fc = 

1 



w^(pief - yw^ie) - -(w0(e) - y)^ 
-yw<p(e) + y^ + ^j3c^ - fc 



(40) 



by collecting the terms we obtain: 



that is 



(wm - yf - i(w0(e) - yf + i/Jc^ - fc, (41) 



i {w<l>(\\x - cf) - yf + ^f3c^ -fc^ P(c). (42) 



that proves the theorem. 



□ 



Theorem l3. 1 I shows that the problem (P'') is canonically dual 
to the primal CP) in the sense that the duality gap is zero. 

4. Gaussian function 

One of the most used RBF is the Gaussian function. In 
this section we will analyze the problem with (f){\\x - c|p) = 
expj-l'^^l, where a is a parameter that represents the stan- 
dard deviation of the Gaussian function. In the RBFNN for- 
mulation normally there is no the linear term fc. The primal 
problem is: 



1 



x-c 



minP(c)= - wexpj- 



If we define the quantity d(c) 
^ : M ^ £a from dHJ becomes 



1 



-y\ 



(43) 

the nonlinear operator 



^ = wexp{-d(c)} . 



(44) 



The expressions that define cr, V and V* are the same as the 
general problem that is: 

. v(m)^j(^-yf; 
• cr = ^ - y; 

. V*icr)^{^cr^+ya-). 
The second order operator A2(c) : M — > £b is 

6 = A2(c) = 11^ - cf = e (45) 
The second level canonical function becomes 



t/(e) = wexp{-^ 



(46) 



And the second order duality mapping r is 



T - wd 



So the Legendre conjugate U* : S'^ ^ M. is 



-2a'^T In 



-2a^T 



(48) 



The derivative of the exponential function is the exponential 
function itself. This simplifies the relation ( fTST i between ^ and 
T making it linear, that is ^ - The relation between cr and 
ris: 

(cr + y) 



20-2 



(49) 



that is also linear. The total complementarity function becomes: 



E(c, cr) = ^c^G(o-) - cF(o-) - U\(r)a- - y*(cr) - 

.-2/' ^2 



(cr H- ycr) 
2a^ 



(50) 



where: 



cr + ycr 



jccr'^ + xycr 



U'icr) = (cr + y) (In (sicr))- I) 

S((T) = 

w 

The dual problem is 



= -^^-ln(.(cr))(cr2+3.^)+ V 



2 G((t) 

x^(cP- + ycr) 
2^2 



(51) 



The domains of the variables in the primal and dual problems 
are: 

• £b = {e e K|e > 0) 

• >Sb = {t e R| - oo < T < 0) if w > 0, >Sb = {r e Ml - CO < 
T < 0) if w < 

• £a = e R|0 < ^ < w) 

• vSa = {cr € R| - y < cr < W - y) if W > 0, >Sa = {cr G 

M|w - y < cr < -y) if w < 

Remark 1. Parameters /3, x, y, and w play important roles 
in solving the non-convex problem (P). In the original prob- 
lem one searches for the value of c that brings the 
term wexp{— t/(c)) as closer as possible to y, that is cr = 
w exp {-d(c)] - y - 0. 

If y < and w > or y > and w < we will have that 
\cr\ > 0. This means that in the case of the exponential function, 
it would be better to choose c as bigger as possible in order to 
make the exponential go to zero, but the result would never be 
satisfactory as the error committed by the approximation would 
go close to -y as c goes to infinity. The value -y is not a good 
value for the error as it is far from zero. On the other hand ify 



and w have the same sign and \y\ > \w\ the value of c will be x in 
order to have the exponential equal to 1 and to have the lowest 
value for cr — w exp {—d(c)} — y. 

In order to have a realistic problem, we will consider the case 
with y and w with the same sign, and with \y\ < \w\. The cases 
with y,w > Q and y,w < are equivalent, so we will suppose 
that both y and w are positive without losing generality. 



\ cr j \ a^G(cr) 
2cr + y 



cr + y 



2 In (i(cr)) . 



(56) 



Since cr is a critical point of the dual, we have that P'^{cr)' - 0. 
Therefore when o" - ^ : 



Theorem 4.1. Suppose that cr e »Sa is a critical point of the 
dual problem ( 1571 ) with the corresponding c = G R and 
that o" i. Then c is a critical point of the primal problem 
and: 



P'{&) = Pic). 



(52) 



moreover, there are the following relations between the critical 
points of the primal problem and the dual problem: 

1. If(2d-+y) >OandG(&) >0or(2&+y) <OandG(&) < 
then if& is a local minimum of the dual problem, the corre- 
sponding c is a local maximum of the primal problem; if a 
is a local maximum of the dual problem the corresponding 
c is a local minimum of the primal problem; 

2. If(2&+y) >QandG(&) <Qor(2&+y) <QandG{&) > 
then if& is a local minimum of the dual problem the corre- 
sponding c is a local minimum of the primal problem; if & 
is a local maximum of the dual problem the corresponding 
c is a local maximum of the primal problem. 



J, then there is a corre- 



Let Xo = ^-2a2ln(X). // ^ 
sponding critical point to a- in the primal problem if and only 
if the parameters x, y, ji and w satisfy one of the two following 
conditions: 



(53) 



and the corresponding critical point c in the primal problem is 
always a local minimum. If neither of conditions ( 1531 ) is satis- 
fied, & — — J is always a critical point of the dual problem, but 
it does not have any corresponding critical point in the primal 
problem. 



=-2«'ln(^(£^)) 



By using condition ( fSTl ) in ( |56] | we obtain: 

/21n(i(o-))(2o- + 3;) 1 



P'i&y ^{2cr + y)\ 



\ a^Gid-) 



cr + y 



(57) 



(58) 



Noticing cr = w exp {-d(c)} - y, it is possible to rewrite P(c)" 
in terms of o", i. e.: 



Picid-))" = G(cr) + ^(<T + y)i2& + y){x 



F(d-) 

a- ■ - \ G((t) 

by using again condition ( |57] | we obtain: 



(59) 



P(c(d-))" = ^ [a^G(&) - 2(cr + 3;)(2cr + y) In (s(d-))] (60) 



so it is possible to rewrite equation (I58I I in the following form: 

2(T + y 



p^io-y = - 



G{(r){& + y) 



Picid-))". 



(61) 



and to find the relations reported in Table 1 . From these rela- 
tions, we obtain: 

• If (2cr + 3;) > OandG(cr) > or (2cr-i-y) < OandG(cr) < 
then the second order derivate of the primal problem and 
the second order derivate of the dual problem have oppo- 
site sign at their critical points; 

• If (2cr + 3;) > OandG(o-) < or (20--H3;) < OandG(cr) > 
then the second order derivate of the primal problem and 
the second order derivate of the dual problem have the 
same sign at their critical points. 



Proof 4.1. The first order derivative for the dual problem is: 

\2 



P'icr)' 



Fio-)\ 1 
X - -r—^ I — r -H In (,<cr)) 



G(o-)/ 2a2 



[2cr + y] (54) 



so the term (l28T l is equal to 2cr -h y. If cr 9^ the critical point 
equivalency and condition (|52] ) are consequences of Theorem 
3J] 

To prove statements (/) and (//) we use the second order deriva- 
tives of the problems Pic) and P''icr) 



Pic)" 



(x-cy 



exp {-die)} (2w exp {-die)} - y) 



+13- ^wexp{-dic)}{wexp{-dic)}-y) (55) 



This proves statements 1 and 2. 



i2(r + y) 


Gia-) 


Picid-)) 


P^io-) 


> 


>0 


± 


+ 


>0 


<0 


± 


± 


<0 


<0 


± 


+ 


<0 


>0 


± 


± 



Table 1 : Relations between the second order derivatives of the primal problem 
and dual problem 

The point a- - -| is a critical point of P'' according to the 
second part of the (l54l l. The point c corresponding to cr = -| 



is a critical point of the primal problem if and only if P'ic) = 0. 
We can use the (fTOl l to find the relation between & and c that is: 



cr-^-y^cr-w exp{-d{c)} - y 



V-2a2 (\n(s(d-))). 



For cr = we obtain: 



C JC _!_ ■ 



(62) 



(63) 



(64) 



Substituting these values in the first order derivative of the pri- 
mal problem: 

p'(c) = ^d(c)w exp{-d{c)] (w exp{-£/(c)) -y)+(3c (65) 



and considering that wexp{-c/(c)) 



cr + y = I and 
w exp{-d(c)} - y - d- = - J we obtain that the primal problem 
has a critical point at c corresponding to the critical o" = -| if 
and only if: 

^x±|j8+^Jx„=0. (66) 

This happens only for a particular configuration of the param- 
eters w, /3, X and y that makes one of the roots the first term of 
the derivative ( |54] |: 



F(&)\ 1 



(67) 



be in cr = - ^ . 



To prove that at o- = - j the critical point of the dual problem 
corresponds to a minimum point of the primal problem we plug 
the value of o" = - 1 in the ( |59] l and obtain 



which is always a positive value. 



(68) 
□ 



Remark 2. Frotn now on we will refer to the critical point 
CTf — as pseudo dual critical point as it is a critical point of 
the dual problem that generally does not have a corresponding 
critical point for the primal problem. 

4.1. Choice of the critical point 

In order to find the best solution among the critical points of 
problem (l43l l we introduce the following feasible spaces: 



{(T e S,\G{(t) > 0} 



Si = {tr e S,\G{(r) < 0} 



(69) 



(70) 



The following theorem explains the relations between the criti- 
cal points: 



Theorem 4.2. Suppose that the point o"] e iS^ and a-i e 

are critical points of the dual problem, that d-j + —j for i = 
1,2 and that ci and q are the corresponding critical points of 
the primal problem. Then if both ci and q are local minima 
or local maxima of the primal problem, the following relation 
always holds: 



P(ci)^P'(cri)<P(c2)^P'(&2) 



(71) 



Proof 4.2. This theorem is a consequence of the first theorem 
in triahty theory . □ 

Remark 3. The pseudo critical point erf — is always in S^. 



From the results in Theorem 14.21 it is always better to search 
for the dual critical point in that corresponds to a minimum 
in the primal problem. In order to characterize the solutions in 
and the domains in which search for the best solution, two 
theorems are proposed in the following: 



Theorem 4.3. Let cTf — be the pseudo critical point of the 
dual problem, Xg — ^-2a^\n{^^^, x positive. Then: 

• if X € (0, Xo) then CTf is always a local minimum of P'^(cr); 

• if X > Xo then: 

2 

1. if p > and p < /^J.^^°_^ y cr/ is a local minimum for 
the dual problem; 

2 

2. ifp > and P > ^J.(^°_^ y erf is a local maximum for 
the dual problem; 



3. ifp >0,p 



4a^(x-x ) ' '^f '^^ inflection point in 
which the first order derivative is zero and that cor- 
responds to a a local minimum of the primal problem. 

Proof 4.3. In order to understand that cry^ = -5 is a minimum 
or a maximum for the dual we have to plug its value in the sec- 
ond order derivative of P'^icr) that is equation ( |56] ) and analyze 
its sign. After the substitution we obtain 



P'iaf) = - 



2 In 



\ 2wl a2 



xp 



(72) 



The first order derivate in p of ( |72] | is — -r, that is the 

function is monotonic decreasing in p. The value of ( l72l l in 
j8 = is - In that is positive. If we make p go to +00 we 

obtain: 



lim - 

S— >+oo 



2 In 



( 2w) 



xp 



J3 + 



y- 

4a- > 



-2 In 



( 2w) 



(73) 

that is the second order derivative of P'^{cf) in cry is non negative 
for any value of y6 > if 



X G [ Xfj, Xo\ 



(74) 



6 



If X does not satisfy this condition, from the (iTZt we have that 
the second order derivative of the dual problem is positive in cry 
if p satisfies: 



2 2 

-y x„ y x„ 



Aa^ (x + Xo) 
On the other hand if: 



Aa^ (x - Xo) 



j6< 



orjS > 



Aa^ {x + Xo) 



Aa^ (x — Xo) 



(75) 



(76) 



there will be a local maximum in cr/. As ;ic is considered pos- 

itive, the term ^^2(,-_^°(. is always negative, so j0 will always be 
greater than it. 

,,2 V 

If the condition /3 - 4^,2 (^/^ ) is satisfied, the critical point cry^ is 
an inflection point that also satisfies the first order condition and 
it has a corresponding minimum point in the primal problem for 
Theorem O □ 



Remark 4. In the case of x negative, the conditions are 
changed in the following way: 

• if X € i—Xo, 0) then CTf is always a local minimum ofP'^icr) 

• if X < —Xo then: 

_,,2j. 

1. if P > and p < 4^,2 y o"/ is a local minimum for 
the dual problem; 

2. if p > and p > j^^[jfp-y erf is a local maximum for 
the dual problem; 

3. if P > 0, P — 4„2^-_^j ) . cry is an inflection point in 
which the first order derivative is zero and that cor- 
responds to a a local minimum of the primal problem. 



The proof of these statement is similar to that of Theorem \4.3\ 
and can be omitted. 



Remark 5. Theorem \4.3\ shows the effects of the parameter p 
on the pseudo critical point CTf. Similar effects can also be 
obtained in respect to y, x, a, and w. The reason we choose p 
is because it is an hyper-parameter that can be chosen by the 
practitioner before performing the optimization. 

For the next theorem, we introduce the two following subsets 
of St : 



-{ 
-{ 



creS:\(r>-^ 



ill) 
(78) 



Theorem 4.4. Let o-f — —j be the pseudo critical point in the 
dual problem and let the primal problem have a maximum of 
five critical points. Then 

• if CTf is a local minimum for the dual function, there will 
be a local maximum in S"^ that corresponds to a minimum 
of the primal problem. 



• if CTf is a local maximum then: 

1. there are no critical points in S'^; 

2. there is at least one critical point in (S'l^ 

Proof 4.4. In the dual problem there must be a singularity point 
in G(cr) = that goes to -00, so if cry is a local minimum, there 

must be a local maximum in St. 

S 

If cry is a local maximum, we prove condition (/) by negating 
the thesis and suppose that there is a least one critical point 
in S'^. As P''(cr) goes to -00 if G(cr) — > 0, there will be no 
one, but two critical points in >SJ^, a local minimum cri and a 

local maximum 0-2 with the relation P''{cri) < P''{cr2). For 
Theorems 14.11 and 14.21 cri corresponds to the second highest 
local maximum of the primal function ci, and 0-2 corresponds 
to the lowest or second lowest local minimum of the primal 
function C2, that is the relation P(c2) < P{ci) is satisfied. By 
Theorem l3.1l we have: 

FVi) < P''((r2) = P(C2) < P(ci) = P^{(ri) (79) 
that is a contradiction. 

To prove condition (//), it is sufficient to notice that if there are 
no critical points in S'^, for the triality theory there must be at 
least one critical point corresponding to the global minimum in 



and this point will be in S^ . 



□ 




Figure 1: Dual algebraic curves with y = I, w = 2, a = and y 
respect to the internal input x 



0.1 in 



Depending on the parameters, the primal problem (33) can 
have at most five critical points. There are several cases: 

Case 1: Three critical points for P(c) and four critical points 
for P''{cr), two critical point in S'^ and two critical points in 
»S~, with cry as local minimum. The values of the parameters 

are y - \, x - \, w - 2, a - p - 0.1 (see Figure |2]l. 
This case can be easily solved with the general canonical du- 
ality frameworkjil, as the local maximum in St^ corresponds 
to the global minimum of the problem, and the local minimum 
and maximum in S^ correspond to the local minimum and max- 
imum in the primal problem. 



7 




Figure 2: Primal(in blue) and dual(in red) functions for Case 1 with three criti- 
cal points 




Figure 3: Primal(in blue) and dual(in red) functions for Case 2 with five critical 
points in the piimal and six critical points in the dual. 



Case 2: Five critical points for P(c), six critical points for 
P'^icr). The values of the parameters are y = I, x - 4, w = 2, 
a — ^ and yS = 0.1 (see Figure O. Notice that the only pa- 
rameter that changed in respect to Case 1 is x. With these pa- 
rameters the problem becomes multi-welled. The two critical 
points with the lowest value of the objective function belong 
to the same double well and their corresponding critical points 
are in >S+. The critical point cr = -0.999999 of P''{cr) is cor- 
responding to the second best minimizer c - 0.00002 of the 
primal problem and this cr is situated near the boundary of S'^ 
which is visible in Figure |4] It is also possible, for certain val- 
ues of the parameters, that the local minimum on the boundary 
of S a, corresponds to the global minimum of the problem (see 
Figure |5]l. In this case the choice of the value for cr should be 
the critical point near the boundary. This critical point corre- 
sponds to a critical point in the primal with the value of c near 
zero. This critical point is generated by the term ^jSc^ that is 
the regularization term used to make the objective function co- 
ercive and more regular On the other hand, this term doesn't 
have anything to do with the original aim of the problem. This 
point near zero in the primal function will always have the cor- 
responding dual critical point near the boundary, because as c 




Figure 4: Critical point on the boundary of the dual function feasible set for 
Case 2. 




Figure 5: of the dual problem in the case of /J = 0.12. The minimum near 
the boundary (T\ is a global minimum. 



gets close to zero, cr - wexp{-t/(c)) - y gets close to -y. We 
also consider that cr = wexp{-t/(c)) - y is the error that origi- 
nally we want to minimize in problem (|6]l and that the critical 
point on the boundary will always have a cr with an absolute 
value bigger than the other critical point closer to cr = 0. In 
other words the local minimum on the boundary has nothing to 
do with the original problem, has an high value of the error and 
should not be considered as a good solution. In order to find 
the optimal solution for the original problem, the local mini- 
mum in the primal problem corresponding to the critical point 
closer to zero in 5^ is preferable. By reducing the value of /? it 
is possible not only to make the critical point near c = into a 
local minimum, but also to assure that cr^ is a local minimum. 
In this way there is a critical point in and the domain of the 
solution is well defined. Basically if the critical point near the 
boundary of 5^ is the global minimum, a very big value of fi 
has been chosen. 

Case 3: Three critical points for P{c) and four critical points 
for F^'icr), all belonging to S^. The values of the parameters 

are y — I, x - 4, w - 2, a - and /3 — 0.22 (see Figure |6]l. 
This case is similar to the previous one, and the solution of the 
dual problem should be the critical point that corresponds to a 



8 




Figure 6: Primal(in blue) and dual(in red) functions for the Case 3 witti ttiree 
critical points in the primal and four ciitical points in .SJ . 



minimum in the primal problem with the value of cr closer to 
zero. 




Figure 7: Primal(in blue) and dual(in red) functions for the Case 4 with three 
critical points in the primal and two critical points in and two critical points 
in 5j and cry^ as a local maximum. 



Case 4: Three critical points in the primal and four critical 
points in the dual, but with two critical points in S^, two criti- 
cal points in and cry as local maximum. The values of the 

parameters arey = I, x - 8, w - 2, a - and - 0.25 (see 
Figure |7]i. If the value of the hyper parameter fi is reduced it is 
possible to make cry into a local minimum and return in one of 
the previous cases. 

Case 5: One critical point in the primal problem and two 
critical points in the dual problem. This case occurs when the 
quadratic term with beta dominates the error function Wix). If 
this case occurs, it means that the value of /? is too big and the 
problem is not related with the original anymore, so one should 
choose a smaller value of ji to have a problem related to the 
original. 

Based on the study of these cases, we can obtain the general 
idea to find the best solution, i. e. the hyper parameter /? should 
be set to a value that satisfies condition (iTST i in order to have 
cry as a local minimum, then search for the critical point in the 



domain S. 



5. Conclusions 

In this paper we have presented an application of the canoni- 
cal duality theory to function approximation using Radial Basis 
Functions. By using the sequential dual canonical transforma- 
tion, the non convex problem with a general RBF function 0(-) 
is reformulated in a canonical dual form. An associated strong 
duality theorem is also proposed. 

Applications to one of the most used RBF, the exponential func- 
tion, are illustrated. Due to the particular properties of the expo- 
nential function, we are able to find a linear relation between the 
dual variables, which leads to an explicit form of the canonical 
dual problem. We also found conditions on the hyper parame- 
ter y6 in order to obtain a reliable domain where to search for the 
best solution. This research reveals an important phenomenon 
in complex systems, i.e. the global optimal solution may not be 
the best solution to the problem considered. 
There are still several open topics on the application of the 
canonical duality theory to Radial Basis Error functions. For 
example there are other kinds of RBF that can be analyzed, like 
the multi quadratic and the multi quadratic inverse functions, 
a further development for future research is to expand the one 
dimensional case to the multidimensional case with also con- 
sidering w as a variable and not as a parameter When this case 
is analyzed, we will be able to realize RBF neural networks 
based on canonical duality theory. 
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